Skip to content

Conversation

@inisis
Copy link
Contributor

@inisis inisis commented Oct 28, 2025

What does this PR do?

Type of change:

Add onnxslim support

Overview: Onnxslim is under active development and committed to long-time-support, it's easy to use and is dependent on very few packages.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=$MODEL_NAME.onnx --simplify

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@inisis inisis requested review from a team as code owners October 28, 2025 11:49
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 28, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@inisis
Copy link
Contributor Author

inisis commented Nov 1, 2025

@gcunhase Hi, any update here? Thanks.

@gcunhase
Copy link
Contributor

gcunhase commented Nov 3, 2025

@gcunhase Hi, any update here? Thanks.

@inisis Thank you for your contribution. I'm doing some investigation on any onnxsim vs onnxslim gaps. Will get back to you as soon as possible.

@gcunhase
Copy link
Contributor

gcunhase commented Nov 6, 2025

@inisis I'm still validating onnxslim on our end, but in the meanwhile, could you please check that switching to onnxslim doesn't break quantization of https://github.com/NVIDIA/DL4AGX/tree/master/AV-Solutions/bevformer-int8-eq?

Specifically, please check that the following CLI is still functional and performant:

$ python -m modelopt.onnx.quantization --onnx_path=/mnt/models/bevformer_tiny_epoch_24_cp2_op13.onnx \
      --trt_plugins=$PLUGIN_PATH \
      --op_types_to_exclude MatMul \
      --calibration_data_path=/workspace/BEVFormer_tensorrt/data/nuscenes/calib_data.npz \
      --simplify

Thanks!

@inisis
Copy link
Contributor Author

inisis commented Nov 11, 2025

Hi, @gcunhase it took me some time to run bevformer-int8-eq, however everything is working fine, here are the results,

Env

device: NVIDIA GeForce RTX 5090
pytorch-quantization      2.2.1
torch                     2.9.0+cu128
torchvision               0.24.0+cu128
onnx                      1.17.0
onnx_graphsurgeon         0.5.8
onnx-ir                   0.1.12
onnxconverter-common      1.16.0
onnxruntime-gpu           1.20.2
onnxscript                0.5.6
onnxsim                   0.4.36
onnxslim                  0.1.74

Without simplify

0ad9b5a8c1af53b9ec70eae75e883add

With onnxsim

c006fa7f0505eff5449b52cfba3bfca5

With onnxslim

5e5548b2f02b972ef28b9be84fc750ae

to conclude:

Method FPS Acceleration Ratio
Without Simplify 354 1.00×
With onnxsim 371 1.05×
With onnxslim 381 1.08×

Well, in terms of GPU Compute Time (median, ms), onnxsim is slightly faster, I compared two models using

onnxslim --inspect /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_sim.onnx /mnt/models/bevformer_tiny_epoch_24_cp2_op13.quant_slim.onnx
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Name          | bevformer_tiny_epoch_24_cp2_op13.quant_s | bevformer_tiny_epoch_24_cp2_op13.quant_s |
|                              |                 im.onnx                  |                 lim.onnx                 |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Info          |       Op Set: 13 / IR Version: 10        |       Op Set: 13 / IR Version: 10        |
+------------------------------+------------------------------------------+------------------------------------------+
|          IN: image           |       float32: (1, 6, 3, 480, 800)       |       float32: (1, 6, 3, 480, 800)       |
|         IN: prev_bev         |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|       IN: use_prev_bev       |              float32: (1,)               |              float32: (1,)               |
|         IN: can_bus          |              float32: (18,)              |              float32: (18,)              |
|        IN: lidar2img         |          float32: (1, 6, 4, 4)           |          float32: (1, 6, 4, 4)           |
|        OUT: bev_embed        |         float32: (2500, 1, 256)          |         float32: (2500, 1, 256)          |
|     OUT: outputs_classes     |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
|     OUT: outputs_coords      |         float32: (6, 1, 900, 10)         |         float32: (6, 1, 900, 10)         |
+------------------------------+------------------------------------------+------------------------------------------+
|             Add              |                   318                    |                   185                    |
|             Atan             |                    1                     |                    1                     |
|             Clip             |                    26                    |                    26                    |
|            Concat            |                    16                    |                    16                    |
|             Conv             |                    55                    |                    55                    |
|             Cos              |                    1                     |                    1                     |
|       DequantizeLinear       |                   175                    |                   393                    |
|             Div              |                    67                    |                    67                    |
|            Gather            |                    14                    |                    14                    |
|             Gemm             |                    7                     |                   140                    |
|           Greater            |                    3                     |                    3                     |
|             Less             |                    2                     |                    2                     |
|             Log              |                    15                    |                    15                    |
|            MatMul            |                   142                    |                    11                    |
|             Max              |                    1                     |                    1                     |
|           MaxPool            |                    1                     |                    1                     |
|             Mul              |                    81                    |                    81                    |
| MultiScaleDeformableAttnTRT2 |                    12                    |                    12                    |
|             Pow              |                    41                    |                    41                    |
|        QuantizeLinear        |                   175                    |                   393                    |
|          ReduceMean          |                    81                    |                    81                    |
|          ReduceProd          |                    1                     |                    1                     |
|          ReduceSum           |                    4                     |                    4                     |
|             Relu             |                    96                    |                    96                    |
|           Reshape            |                   105                    |                   269                    |
|          RotateTRT2          |                    1                     |                    1                     |
|          ScatterND           |                    58                    |                    58                    |
|           Sigmoid            |                    18                    |                    18                    |
|             Sign             |                    2                     |                    2                     |
|             Sin              |                    1                     |                    1                     |
|            Slice             |                    84                    |                    84                    |
|           Softmax            |                    5                     |                    5                     |
|            Split             |                    1                     |                    0                     |
|             Sqrt             |                    40                    |                    40                    |
|           Squeeze            |                    1                     |                    1                     |
|             Sub              |                    59                    |                    59                    |
|             Tile             |                    6                     |                    6                     |
|          Transpose           |                    36                    |                    36                    |
|          Unsqueeze           |                    30                    |                    30                    |
|            Where             |                    5                     |                    5                     |
+------------------------------+------------------------------------------+------------------------------------------+
|          Model Size          |                158.77 MB                 |                158.90 MB                 |
+------------------------------+------------------------------------------+------------------------------------------+

Onnxslim will merge Matmul + Add into Gemm, this is not in favor when using --op_types_to_exclude MatMul

@gcunhase
Copy link
Contributor

Add | 318 | 185 |
| Atan | 1 | 1 |
| Clip | 26 | 26 |
| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

@inisis
Copy link
Contributor Author

inisis commented Nov 11, 2025

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?

I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.

Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

@gcunhase
Copy link
Contributor

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?
I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.
Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

Add | 318 | 185 |

| Atan | 1 | 1 |

| Clip | 26 | 26 |

| Concat | 16 | 16

Hi @inisis thanks for validating this functionality. Were you also able to validate the numerical accuracy for the onnxslim simplified model?
I will also do some investigation on the MatMul+Add vs Gemm substitution on my end in the meanwhile.
Thanks!

@gcunhase I didn't use the full dataset from nuscenes, it's too big, I used the mini one to do the calibration. If this counts, I can verify it on the mini one.

No problem, let me try to verify the accuracy on my end. Thank you!

@inisis
Copy link
Contributor Author

inisis commented Nov 18, 2025

Hi @gcunhase , is there any update? Thanks

@gcunhase
Copy link
Contributor

@inisis we appreciate your contribution and wanted to make sure that there are no regressions before merging this PR. We've investigated potential risks in ~150 models and compiled a list of issues, divided into 3 categories, that would need to be solved before merging.

All mentioned models and scripts are in the zip file: repro.zip

1. Functional failures

Error logs

Error 1: repro_io_tensors_shape_dtype.onnx

Graph input and output tensors must include dtype information. Please set the dtype attribute for: Variable (NMS): (shape=None, dtype=None)) 

Error 2: repro_mode_error_mobilenetv1.onnx

Fail - onnxSLIM (onnxSLIM: 'mode') 

How to repro

import onnx
import onnxslim

model = onnx.load(input_model_path)
simplified_model = onnxslim.slim(model)

2. ORT inference failures

Error logs

Error 1: repro_mul_incompatible_dimensions.onnx

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from repro_mul_incompatible_dimensions.onnx failed:Node (/stages.1/stages.1.0/Mul) Op (Mul) [ShapeInferenceError] Incompatible dimensions 

Error 2: repro_gemm_invalid_shape.onnx

Fail: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gemm node. Name:'/transformer/decoder/layers.0/attentions.1/attn/Gemm' Status Message: Gemm: Invalid bias shape for broadcast 

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

3. ORT numerical accuracy failures

Error logs

The simplified versions of the following models do not produce the same outputs as the original model for the same input data:

  • issue3_repro_conv_bn_fusion.onnx
    • WAR: skip_fusion_patterns=["FusionConvBN"]
  • issue3_repro_conv_resize_issue.onnx
    • WAR: none found.

How to repro

Run the check_ort_failures.py python script (update input_model_path as needed).

--
Please let us know if there's any additional questions on any of the items.
Thanks!

@inisis
Copy link
Contributor Author

inisis commented Nov 22, 2025

@gcunhase So much appreciation for your comprehensive testing, which has helped us improve onnxslim. All the issues you mentioned have been resolved in version 0.1.75 of onnxslim, and these models have also been added to onnxslim’s daily CI. Many thanks again.

Here are some details when solving the issues:

1. Functional failures

If model is ended with custom opertor as output, onnxslim is unable to do symbolic shape inference for it, so it will lose dtype and shape, we improved it by using the info already stored in the original model.
but users can provide custom shape inference logic for theirs own function, onnxslim supports it and has a template for it.

2. ORT inference failures

In onnxslim the shape inference for the outputs of resize node is aligned with official onnx documentation
https://onnx.ai/onnx/operators/onnx__Resize.html#summary
in the official doc, the output size is floored

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale)

where is onnxruntime, the output size if rounded,
https://github.com/microsoft/onnxruntime/blob/977efe4788b2ee24371523b5fa14dd02efcd4942/onnxruntime/core/providers/cpu/tensor/upsample.cc#L70

so there is a mismatch, and in some cases, there will be an incompatible_dimensions issue, now we are aligned with ort.

3. ORT numerical accuracy failures

there is a precision issue with issue3_repro_conv_resize_issue.onnx
in check_ort_failures.py, it uses np.array_equal, which is very strict, I check the maximum diff which is 3.5762787e-07, and if tested with

opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED

the np.array_equal is passed,
so I guess there maybe some ort optimizaion which result in this numerical diff.

Copy link
Contributor

@gcunhase gcunhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inisis we appreciate your speedy and detailed reply!

I was able to verify that all cases now pass with v0.1.75 and that disabling layout optimizations in ORT solves the numerical accuracy issue observed in the last model. This is achieved by adding the following line in our comparison script (as you suggested):

session_opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED

Approved.

@gcunhase
Copy link
Contributor

@kevalmorabia97 can you please update the CHANGELOG file? Not sure which ModelOpt version this update would be included.

My suggestion would something like: Replace ONNX simplification package from 'onnxsim' to 'onnxslim'.

Thanks.

Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. Great to have better onnx simplification package! Will wait for CICD to pass and then merge

@kevalmorabia97 kevalmorabia97 changed the title feat: add onnxslim support Replace ONNX simplification package from onnxsim to onnxslim Nov 26, 2025
@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.78%. Comparing base (768ee6a) to head (3b1e46c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/quantize.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #478      +/-   ##
==========================================
+ Coverage   74.76%   74.78%   +0.02%     
==========================================
  Files         183      183              
  Lines       18630    18626       -4     
==========================================
+ Hits        13929    13930       +1     
+ Misses       4701     4696       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97
Copy link
Collaborator

@inisis there is conflict installing with torch2.6

The conflict is caused by:
    torch 2.6.0 depends on sympy==1.13.1; python_version >= "3.9"
    onnxruntime-gpu 1.22.0 depends on sympy
    onnxslim 0.1.75 depends on sympy>=1.13.3

@inisis
Copy link
Contributor Author

inisis commented Nov 26, 2025

There is a sympy version conflict

The conflict is caused by:
    torch 2.6.0 depends on sympy==1.13.1; python_version >= "3.9"
    onnxruntime-gpu 1.22.0 depends on sympy
    onnxslim 0.1.75 depends on sympy>=1.13.3

@kevalmorabia97
Copy link
Collaborator

Can onnxslim relax sympy required version?

@inisis
Copy link
Contributor Author

inisis commented Nov 26, 2025

@inisis there is conflict installing with torch2.6

The conflict is caused by:
    torch 2.6.0 depends on sympy==1.13.1; python_version >= "3.9"
    onnxruntime-gpu 1.22.0 depends on sympy
    onnxslim 0.1.75 depends on sympy>=1.13.3

Yes, I will check it asap, I don’t understand why PyTorch needs to pin SymPy to version 1.13.1.

@inisis
Copy link
Contributor Author

inisis commented Nov 26, 2025

@kevalmorabia97 the latest pytorch requires sympy>=1.13.3 https://github.com/pytorch/pytorch/blob/main/pyproject.toml#L47

There is also a conflict in onnxslim's CI, but it didn't break the pipeline.
https://github.com/inisis/OnnxSlim/actions/runs/19593268908/job/56114498429#step:4:52

realAsma and others added 8 commits November 26, 2025 21:04
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586)

## What does this PR do?

**Type of change:**  Refator; Minor new feature

**Overview:** ?

1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher &
AutoQuantizeGradientSearcher - Prepares architecture for additional
search methods.
2. seperated quant modules and score modules - separate quantization
modules from scoring modules, enabling auto-quantization to measure
sensitivity at parent layers (e.g., MLP output for MoE experts) rather
than individual ops.
3. Also see NVIDIA#592
and NVIDIA#588

## Testing
See unittests; `tests/unit/torch/quantization/test_autoquant.py` and
`tests/unit/torch/quantization/plugins/test_huggingface.py`

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not Required

## Additional Information
<!-- E.g. related issue. -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added support for score modules in quantization workflows.
  * Added optional naming for quantization recipes.

* **Bug Fixes**
* Improved quantization grouping rules documentation with clearer
configuration examples.

* **Refactor**
  * Renamed quantization module parameters for improved clarity.
  * Enhanced quantization search architecture for better scalability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: realAsma <[email protected]>
Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
Signed-off-by: inisis <[email protected]>
@inisis inisis force-pushed the main branch 2 times, most recently from 77d6bf9 to 18c27dd Compare November 26, 2025 13:07
@inisis inisis requested review from a team as code owners November 26, 2025 13:07
@inisis inisis requested a review from realAsma November 26, 2025 13:07
@kevalmorabia97 kevalmorabia97 removed request for a team and realAsma November 26, 2025 17:45
@kevalmorabia97
Copy link
Collaborator

/ok to test 3b1e46c

@kevalmorabia97 kevalmorabia97 merged commit 261858c into NVIDIA:main Nov 26, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants